Efficient Algorithms for Discovering Frequent and Maximal Substructures from Large Semistructured Data
نویسنده
چکیده
In this paper, we review recent advances in efficient algorithms for semi-structured data mining , that is, discovery of rules and patterns from structured data such as sets, sequences, trees, and graphs. After introducing basic definitions and problems, We present efficent algorithms for frequent and maximal pattern mining for classes of sets, sequences, and trees. In particular, we explain general techniques, called the rightmost expansion and PPC-extension, which are powerful tools for designing efficient algorithms. We also give examples of applications of semi-structured data mining to real world data.
منابع مشابه
Discovering Frequent Substructures in Large Unordered Trees
In this paper, we study a frequent substructure discovery problem in semi-structured data. We present an efficient algorithm Unot that computes all frequent labeled unordered trees appearing in a large collection of data trees with frequency above a user-specified threshold. The keys of the algorithm are efficient enumeration of all unordered trees in canonical form and incremental computation ...
متن کاملFRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents
In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pat...
متن کاملA comprehensive method for discovering the maximal frequent set
The association rule mining can be divided into two steps.The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold.The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the a...
متن کاملYAFIMA: Yet Another Frequent Itemset Mining Algorithm
Efficient discovery of frequent patterns from large databases is an active research area in data mining with broad applications in industry and deep implications in many areas of data mining. Although many efficient frequent-pattern mining techniques have been developed in the last decade, most of them assume relatively small databases, leaving extremely large but realistic datasets out of reac...
متن کاملKnowledge Discovery for Sensor Network Comprehension
During the past decade, we have witnessed an explosive growth in our capabilities to both generate and collect data. Various data mining techniques have been proposed and widely employed to discover valid, novel and potentially useful patterns in these data. Data mining involves the discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010